Workshop 1: An Introduction to RStudio


Objectives

  • To familiarise yourself with RStudio and understand what each panel does.
  • To set-up a project and create an R Script.

Introduction — R you Ready?

Welcome to the world of R. Firstly, you’ll need access to R. You can download R and R studio here. There are two steps to installation. First you need to install R, and second you need to install RStudio.

R and RStudio are not the same thing:

  • R is a coding language primarily used for statistical computing and data visualisation, although it can do many different things! Think of R as a written language. Like any language we need somewhere to write it down…
  • RStudio is a place you can write down and run your code! It is an R toolbox or workspace that provides a user-friendly interface for writing and running your R code.
Click to learn more Think of R code as like a recipe or a set of instructions to tell someone how to make a delicious meal. Whereas RStudio is your kitchen filled with all your tools and equipment that you can interact with. It allows you to organise your ingredients (data), mix them together (writing and running code), and see the delicious and tasty results (view outputs and plots).

Why are you making us learn R?

Over the last decade, R has become the “go to” tool to help carry out data analysis in psychological research. R is free and open source. Coding is a highly desirable and transferable skill. However, you don’t need to become an advanced genius coder. Gaining an understanding of how coding works to help you organise, analyse, and present data will be enough for a psychology undergraduate degree. There are lots of reasons to use R in psychological research and you can read about more here

RStudio

When you open RStudio you should see a window that looks like this:

You will see three sections:

  1. The console is the largest panel on the left. This is where R will produce any written output for you to read and make sense of – almost like a printer.

  2. The environment is the top right panel is where R keeps a list of any data you are working with- almost like R’s memory.

  3. The files panel on the bottom right does a few things, as it has a few different tabs. I’ll talk through the most commonly used tabs:

    • Files allows you to navigate to files on your computer, plots is an important one as it is where R produces any data visuals (e.g., graphs or formatted tables), and the packages tab helps you manage your packages.
Click to learn more What’s a package? Think of a package as an “add on” or an “app” you can add into R. It is a toolbox with various functions to help you run different analyses. You need to install packages as and when you need them. More on this later…

Creating a Project

You can use “projects” in R which can help keep your work tidy and organised. It also means you can save your work and come back to it any time in the future. If you are a PS2010 student at Royal Holloway, I’d recommend you create one project for all of your PS2010 work in R. Make sure you save it somewhere sensible, for example, if you’re using a campus PC, save it onto your Y:Drive.

How to create a new project:

  1. In the top right corner, find and click on the blue button which says “Project: (None)”.

  2. Select “New Directory”.

  3. Select “New Project”.

  4. Give your project a suitable name. Royal Holloway students, I recommend you call it the module code (e.g., ps1010 or ps2010)“.

  5. Now you are ready to create a new script. Look for this button () in the top left corner of your screen, click on it and select “R Script”.

  6. If you have followed those steps correctly, a new panel should open up in RStudio.

This new panel on the top left is the “script” panel. This is where you can enter your code – think of the script panel as an input panel. Helpfully, you can save your script at anytime which means you can come back to your code at a later date.


Time to Start Coding in R

Let’s begin with some very simple coding in the script panel. In your new script panel add the following:

# My first R script

A useful thing to know is anything you write after # (the hashtag symbol) is called a “comment”. This is a way to keep notes that R wont read. R will ignore anything that comes after a hashtag. Think of these as human notes, ignored by the computer!

Annotating your code with comments is a really good habit to get into because it means you have the comment to look at in the future, almost like revision notes to remind you what each line of code does. It means when you come back to re-run or recycle code in the future, you can figure out what it does quickly.

Now let’s get R to work. In the script panel enter the following:

date()

STOP! You might remember from above I mentioned that using comments to annotate your code is best practice. Well straight away I have ignored my own advice. Let’s try that again:

date() # Ask R what today's date is

I suppose it is easy to work out what date() does without the comment. But as your code gets more complicated, using comments will become so important!

How to run the code:

There are a few ways to run code in R. The easiest is probably to make sure your cursor is at the end of the line you’ve just written and then press CTRL + ENTER if you’re on a Windows PC or COMMAND + ENTER if you’re on a Mac. Give this a go now and R should tell you the date (check the console – bottom left panel).

Why not have a go at asking R something else. R is essentially a very sophisticated and fancy calculator, but let’s try some basic and more complex maths sums to get you used to running lines of code. Try out the sums below and run each one or add any sum you like

2+2 # Ask R a maths question
25*80 # Ask R a slightly more tricky question
1234/92 # Ask R a really tricky question

Just a note on symbols for maths questions:

+ will add.

- will subtract.

* will multiply.

/ will divide.

Everyone Loves a Compliment

Who doesn’t love a compliment? Let’s create a random compliment generator in R. Enter and run the code below to receive a compliment:

compliments = c("You're awesome!", "You're a coding superstar!",
    "Keep on slaying this workshop!", "You’re the best")
random_compliment = sample(compliments, 1)
print(random_compliment)
Click to learn more

If you want to know what each part of the compliment generator code does, I’ll explain below:

  • compliments = c() created a vector, in this case a small data set with four compliments.

  • random_compliment = sample(compliments, 1) asked R to create an object in R which sampled one of the compliments from the vector.

  • print(random_compliment) printed the sampled compliment.


Optional Exercise

Copy and paste the code for the random compliment generator and adapt it to create a random insult generator! Playing around with and tweaking code is a fun way to improve your skills.

Hints
  • Change compliments to insults.
  • Change the compliments inside the speech marks to your chosen fun insults.
  • Change random_compliment to random_insult
  • Change sample(compliments) to sample(insults)
  • Change print(random_compliment) to print(random_insult)

PS1010 students: That’s it for today’s workshop! Remember to save your R project and script. Check where your project has been saved on your Y: Drive so you know where to find it in the future. We will re-visit this project next week. You will also need these resources for the quiz.

Workshop 2: Getting Data into RStudio


Objectives

  • To set up the working directory in RStudio.
  • To import some data into RStudio.
  • Use R code to produce a simple plot.

Introduction

In the previous section you learnt what RStudio looks like and how to write and execute some fairly basic lines of code. Now it is time to import some data into R to begin working with it. RStudio likes a particular format of data file. This is a comma separate values forma or .csv. You can save excel files as a .csv file. There are also some other important rules about how data should be laid out in an excel file, but we will come back to those later.

For now, the aim is to get a pre-existing data set into R and to produce a simple graph to display the data.


R is very particular. This means when you enter code, it needs to be entered perfectly with symbols and letters placed exactly where they should be. R Code is also case sensitive!

Most error messages you see when first learning to code tend to come from typos. Do take your time and make sure that the code has been copied exactly as it should be – any symbol or space in the wrong place will make R upset.


Loading Packages

A helpful first step is to load any packages that you might need. Packages are essentially “add-ons” that you can use within R. There are loads of packages out there that will allow you to carry out various tasks in R. For now we need one called tidyverse. Use the code below:

install.packages("tidyverse")
library("tidyverse")
  • install.packages() is a command used to ask RStudio to install a package you want.
  • library() is how we ask RStudio to load up the package (to open it in R). You might see some error messages but these are just warnings about which version of R was used to create the package(s).

The Working Directory

The working directory can be any folder on your computer. It’s important to know which folder this is, as this is where RStudio will look for any data files. You can check the working directory at anytime using:

getwd()  # Asks R to check the current working directory.

Give that a try now. As you followed the previous steps of “Creating a Project” earlier on (scroll up!), hopefully your working directory is already set to that same project directory. However, if not, or if you want to change your working directory, you can do so following these steps:

  1. Click the “Session” tab along the top of the RStudio window.

  2. Click “Set Working Directory”.

  3. Click “Choose Directory”.

  4. A window will pop up and you need to select the folder on your computer that you want to set as your working directory.

You can double check this has worked by using getwd() at anytime.

It is vital you know where the working directory is, as any external data files you use must be saved into the working directory.


Getting the Data into R

For today’s example you need to download the corr.csv data set. PS1010 students can find this on Moodle or use this link. Make sure the file is saved into your working directory. Make sure the file is saved as a .csv file (you can open the file to look at it, in fact I encourage you to do this, but do not save it again as a different file type, it must be .csv)

What is .csv?

R likes to work with a particular data file. When we finish an experiment and we have collected all the data, we need to store that data somewhere—usually an excel spreadsheet. R prefers these spreadsheets to be saved as a .csv (comma separated values) file type. You don’t need to change the file this time around as it is already a .csv file, but for the future it is really easy to save any excel file as a .csv. If you want to know how to do this, take a look here.

Import Data Using read_csv()

The next step is to get the data from the .csv file into RStudio. Providing you have downloaded the file correctly, and saved it in .csv format in your working directory, the following code should work:

mydata = read_csv("corr.csv")  # Asks R to read the file and save as an object called 'mydata'

If successful, you should now be able to see mydata appear in the top right panel, under the environment tab. This means your data is now in RStudio.

Having problems with the data file?

Use this code to manually add the data as a data frame.

# Create the corr.csv data frame manually without using read_csv
mydata <- data.frame(time_outside = c(120, 101, 85, 55, 41, 22, 123, 90, 66, 32,
    12, 130, 90, 50, 33, 70, 65, 54, 111, 24, 11, 115, 9, 80, 129), happiness = c(8,
    7, 6, 5, 4, 1, 10, 5, 6, 4, 5, 6, 4, 3, 5, 7, 7, 6, 8, 3, 4, 10, 2, 6, 10))

The Data

First it might be useful to know what the data set consists of. As I’m writing this it is nearly spring (2024) and I’m looking forward to spending more time outside. It makes me happy. Well, anyway, the data set for this exercise contains two variables:

  1. Number of minutes outside called time_outside.
  2. How happy you are on a scale of one to ten called happiness.

Our question today is does spending more time outside make you happier?


Creating a Simple Scatter Plot

A scatter plot is a type of graph that shows a potential relationship between two variables, for example, as one thing increases does another thing also increase? Or maybe there is no relationship at all? We can graphically look at this using a scatter plot. Let’s plot our time outside and happiness data. Below is the basic code.

ggplot(data, aes(x = x_var, y = y_var)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "black") +
  labs(title = "Scatter Plot with Line of Best Fit",
       x = "X-axis Label",
       y = "Y-axis Label")
  • data is where you can tell RStudio the name of your data set. Earlier we called it mydata so this will need changing.
  • x_var is where you should enter the name of your X-axis variable exactly as it appears in the .csv file. It should be time_outside.
  • y_var is where you should enter the name of your y-axis variable exactly as it appears in the .csv file. It should be happiness.
  • x = "X-axis Label" and y = "Y-axis Label” is where you tell RStudio what text labels you want to display on each axis. For the x axis I’d go for "Time Spend Outside Per Week (mins)” and for the y axis I’d go for “Happiness Score” as that matches the first line of the code where you specified the x and y axis (above). Make sure you use speech marks as this is a piece of text that will be shown on the graph.
  • labs(title = "Scatter Plot with Line of Best Fit", is also where your figure title appears. You can amend this to whatever is appropriate, or remove it, for example if you’re presenting an APA formatted figure you wouldn’t need a title.

Have a go at amending the above code yourself to match the data set. Once you’ve done this, reveal the correct answer below.

Click to reveal the answer

Remember that R code is case sensitive and you have to write things exact to match the.csv file. Remember the golden rule!

ggplot(mydata, aes(x = time_outside, y = happiness)) + geom_point() + geom_smooth(method = "lm",
    se = FALSE, color = "black") + labs(title = "Scatter Plot with Line of Best Fit",
    x = "Time Spend Outside Per Week (mins)", y = "Happiness Score")

If you want the figure without the title use this code below, can you notice which line has been removed so that the title no longer shows?

ggplot(mydata, aes(x = time_outside, y = happiness)) + geom_point() + geom_smooth(method = "lm",
    se = FALSE, color = "black") + labs(x = "Time Spend Outside Per Week (mins)",
    y = "Happiness Score")

Once you have amended the code, run it and check out the “Plots” tab in the bottom right panel of RStudio.


PS1010 students: That’s it for today’s workshop! Remember to save your R project and script. Check where your project has been saved on your Y: Drive so you know where to find it in the future. You will need these resources for the quiz.